Integrative gene set analysis of multi-platform data with sample heterogeneity

نویسندگان

  • Jun Hu
  • Jung-Ying Tzeng
چکیده

MOTIVATION Gene set analysis is a popular method for large-scale genomic studies. Because genes that have common biological features are analyzed jointly, gene set analysis often achieves better power and generates more biologically informative results. With the advancement of technologies, genomic studies with multi-platform data have become increasingly common. Several strategies have been proposed that integrate genomic data from multiple platforms to perform gene set analysis. To evaluate the performances of existing integrative gene set methods under various scenarios, we conduct a comparative simulation analysis based on The Cancer Genome Atlas breast cancer dataset. RESULTS We find that existing methods for gene set analysis are less effective when sample heterogeneity exists. To address this issue, we develop three methods for multi-platform genomic data with heterogeneity: two non-parametric methods, multi-platform Mann-Whitney statistics and multi-platform outlier robust T-statistics, and a parametric method, multi-platform likelihood ratio statistics. Using simulations, we show that the proposed multi-platform Mann-Whitney statistics method has higher power for heterogeneous samples and comparable performance for homogeneous samples when compared with the existing methods. Our real data applications to two datasets of The Cancer Genome Atlas also suggest that the proposed methods are able to identify novel pathways that are missed by other strategies. AVAILABILITY AND IMPLEMENTATION http://www4.stat.ncsu.edu/∼jytzeng/Software/Multiplatform_gene_set_analysis/

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An Integrative Platform for Three-dimensional Quantitative Analysis of Spatially Heterogeneous Metastasis Landscapes

Metastatic microenvironments are spatially and compositionally heterogeneous. This seemingly stochastic heterogeneity provides researchers great challenges in elucidating factors that determine metastatic outgrowth. Herein, we develop and implement an integrative platform that will enable researchers to obtain novel insights from intricate metastatic landscapes. Our two-segment platform begins ...

متن کامل

Whole Exome Sequencing for Mutation Screening in Hemophagocytic Lymphohistiocytosis

Background: Hemophagocytic lymphohistiocytosis (HLH) is an immune system disorder characterized by uncontrolled hyper-inflammation owing to hypercytokinemia from the activated but ineffective cytotoxic cells. Establishing a correct diagnosis for HLH patients due to the similarity of this disease with other conditions like malignant lymphoma and leukemia and similarity among its two forms is dif...

متن کامل

Identification of cancer genomic markers via integrative sparse boosting.

In high-throughput cancer genomic studies, markers identified from the analysis of single data sets often suffer a lack of reproducibility because of the small sample sizes. An ideal solution is to conduct large-scale prospective studies, which are extremely expensive and time consuming. A cost-effective remedy is to pool data from multiple comparable studies and conduct integrative analysis. I...

متن کامل

moGSA : integrative single sample gene - set analysis of 1 multiple omics data

13. CC-BY-NC 4.0 International license peer-reviewed) is the author/funder. It is made available under a The copyright holder for this preprint (which was not. Abstract 14 Background: The increasing availability of multi-omics datasets has created an opportunity to 15

متن کامل

A Fuzzy TOPSIS Approach for Big Data Analytics Platform Selection

Big data sizes are constantly increasing. Big data analytics is where advanced analytic techniques are applied on big data sets. Analytics based on large data samples reveals and leverages business change. The popularity of big data analytics platforms, which are often available as open-source, has not remained unnoticed by big companies. Google uses MapReduce for PageRank and inverted indexes....

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Bioinformatics

دوره 30 11  شماره 

صفحات  -

تاریخ انتشار 2014